Goto

Collaborating Authors

 deep lstm


Sequencer: Deep LSTM for Image Classification

Neural Information Processing Systems

In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues.


Sequencer: Deep LSTM for Image Classification

Neural Information Processing Systems

In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance.


Sequencer: Deep LSTM for Image Classification

Tatsunami, Yuki, Taki, Masato

arXiv.org Artificial Intelligence

In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance. Despite its simplicity, several experiments demonstrate that Sequencer performs impressively well: Sequencer2D-L, with 54M parameters, realizes 84.6% top-1 accuracy on only ImageNet-1K. Not only that, we show that it has good transferability and the robust resolution adaptability on double resolution-band.


Transformer-based encoder-encoder architecture for Spoken Term Detection

Švec, Jan, Šmídl, Luboš, Lehečka, Jan

arXiv.org Artificial Intelligence

The paper presents a method for spoken term detection based on In this work, we do not focus on the direct processing of the the Transformer architecture. We propose the encoder encoder input speech signal. Instead, we use the speech recognizer to convert architecture employing two BERT-like encoders with additional an audio signal into a graphemic recognition hypothesis. The modifications, including convolutional and upsampling layers, attention representation of speech at the grapheme level allows preprocessing masking, and shared parameters. The encoders project a the input audio into a compact confusion network and further to a recognized hypothesis and a searched term into a shared embedding sequence of embedding vectors. In [7], we proposed a Deep LSTM space, where the score of the putative hit is computed using the calibrated architecture for spoken term detection, which uses the projection dot product. In the experiments, we used the Wav2Vec 2.0 of both the input speech and searched term into a shared embedding speech recognizer, and the proposed system outperformed a baseline space. The hybrid DNN-HMM speech recognizer produced method based on deep LSTMs on the English and Czech STD phoneme confusion networks representing the input speech. The datasets based on USC Shoah Foundation Visual History Archive DNN-HMM speech recognizer can be replaced with the Wav2Vec (MALACH).


Deep LSTM based Malware Analysis

#artificialintelligence

Malware development has seen diversity in terms of architecture and features. This advancement in the competencies of malware poses a severe threat and opens new research dimensions in malware detection. This study is focused on metamorphic malware that is the most advanced member of the malware family. It is quite impossible for anti-virus applications using traditional signature-based methods to detect metamorphic malware, which makes it difficult to classify this type of malware accordingly. Recent research literature about malware detection and classification discusses this issue related to malware behaviour.


Deep LSTM-Based Goal Recognition Models for Open-World Digital Games

Min, Wookhee (North Carolina State University) | Mott, Bradford (North Carolina State University) | Rowe, Jonathan (North Carolina State University) | Lester, James (North Carolina State University)

AAAI Conferences

Player goal recognition in digital games offers the promise of enabling games to dynamically customize player experience. Goal recognition aims to recognize players’ high-level intentions using a computational model trained on a player behavior corpus. A significant challenge is posed by devising reliable goal recognition models with a behavior corpus characterized by highly idiosyncratic player actions. In this paper, we introduce deep LSTM-based goal recognition models that handle the inherent uncertainty stemming from noisy, non-optimal player behaviors. Empirical evaluation indicates that deep LSTMs outperform competitive baselines including single-layer LSTMs, n-gram encoded feedforward neural networks, and Markov logic networks for a goal recognition corpus collected from an open-world educational game. In addition to metric-based goal recognition model evaluation, we investigate a visualization technique to show a dynamic goal recognition model’s performance over the course of a player’s goal-seeking behavior. Deep LSTMs, which are capable of both sequentially and hierarchically extracting salient features of player behaviors, show significant promise as a goal recognition approach for open-world digital games.